跟踪牲畜的行为能够早期发现,从而预防现代动物农场的传染病。除了经济增益之外,这将减少畜牧业养殖的抗生素量,否则进入人类饮食恼怒的抗生素抗性的流行病 - 死亡的主要原因。我们可以使用标准的摄像机,在大多数现代农场提供,以监控牲畜。然而,大多数计算机视觉算法在这项任务上表现不佳,主要是因为(i)农场繁殖的动物看起来相同,缺乏任何明显的空间签名,(ii)没有现有的跟踪器对于长期保持稳健,并且(iii)真实 - 改变照明,频繁遮挡,不同的相机角度和动物尺寸的诸如变化的条件使得模型概括为概括。鉴于这些挑战,我们开发了针对小组母猪的端到端行为监测系统,以执行同时实例级分段,跟踪,动作识别和重新识别(星)任务。我们呈现StarFormer,这是第一个端到端的多目标牲畜监测框架,通过使用变压器架构了解分组猪的实例级嵌入式。对于基准测试,我们展示了一种仔细的策划数据集,包括视频序列,其中具有实例级界限框,实际室内养殖环境中的猪的分段,跟踪和活动分类。在明星任务上使用同步优化,我们展示了星际器优于培训的流行基线模型,为个人任务培训。
translated by 谷歌翻译
We introduce LaViLa, a new approach to learning video-language representations by leveraging Large Language Models (LLMs). We repurpose pre-trained LLMs to be conditioned on visual input, and finetune them to create automatic video narrators. Our auto-generated narrations offer a number of advantages, including dense coverage of long videos, better temporal synchronization of the visual information and text, and much higher diversity of text. The video-text embedding learned contrastively with these additional auto-generated narrations outperforms the previous state-of-the-art on multiple first-person and third-person video tasks, both in zero-shot and finetuned setups. Most notably, LaViLa obtains an absolute gain of 10.1% on EGTEA classification and 5.9% Epic-Kitchens-100 multi-instance retrieval benchmarks. Furthermore, LaViLa trained with only half the narrations from the Ego4D dataset outperforms baseline models trained on the full set, and shows positive scaling behavior on increasing pre-training data and model size.
translated by 谷歌翻译
Contrails, short for condensation trails, are line-shaped ice clouds produced by aircraft engine exhaust when they fly through cold and humid air. They generate a greenhouse effect by absorbing or directing back to Earth approximately 33% of emitted outgoing longwave radiation. They account for over half of the climate change resulting from aviation activities. Avoiding contrails and adjusting flight routes could be an inexpensive and effective way to reduce their impact. An accurate, automated, and reliable detection algorithm is required to develop and evaluate contrail avoidance strategies. Advancement in contrail detection has been severely limited due to several factors, primarily due to a lack of quality-labeled data. Recently, proposed a large human-labeled Landsat-8 contrails dataset. Each contrail is carefully labeled with various inputs in various scenes of Landsat-8 satellite imagery. In this work, we benchmark several popular segmentation models with combinations of different loss functions and encoder backbones. This work is the first to apply state-of-the-art segmentation techniques to detect contrails in low-orbit satellite imagery. Our work can also be used as an open benchmark for contrail segmentation and is publicly available.
translated by 谷歌翻译
Given a dataset of expert agent interactions with an environment of interest, a viable method to extract an effective agent policy is to estimate the maximum likelihood policy indicated by this data. This approach is commonly referred to as behavioral cloning (BC). In this work, we describe a key disadvantage of BC that arises due to the maximum likelihood objective function; namely that BC is mean-seeking with respect to the state-conditional expert action distribution when the learner's policy is represented with a Gaussian. To address this issue, we introduce a modified version of BC, Adversarial Behavioral Cloning (ABC), that exhibits mode-seeking behavior by incorporating elements of GAN (generative adversarial network) training. We evaluate ABC on toy domains and a domain based on Hopper from the DeepMind Control suite, and show that it outperforms standard BC by being mode-seeking in nature.
translated by 谷歌翻译
Deep Ensemble Convolutional Neural Networks has become a methodology of choice for analyzing medical images with a diagnostic performance comparable to a physician, including the diagnosis of Diabetic Retinopathy. However, commonly used techniques are deterministic and are therefore unable to provide any estimate of predictive uncertainty. Quantifying model uncertainty is crucial for reducing the risk of misdiagnosis. A reliable architecture should be well-calibrated to avoid over-confident predictions. To address this, we propose a UATTA-ENS: Uncertainty-Aware Test-Time Augmented Ensemble Technique for 5 Class PIRC Diabetic Retinopathy Classification to produce reliable and well-calibrated predictions.
translated by 谷歌翻译
Covid-19影响了世界各地,尽管对爆发的错误信息的传播速度比病毒更快。错误的信息通过在线社交网络(OSN)传播,通常会误导人们遵循正确的医疗实践。特别是,OSN机器人一直是传播虚假信息和发起网络宣传的主要来源。现有工作忽略了机器人的存在,这些机器人在传播中充当催化剂,并专注于“帖子中共享的文章”而不是帖子(文本)内容中的假新闻检测。大多数关于错误信息检测的工作都使用手动标记的数据集,这些数据集很难扩展以构建其预测模型。在这项研究中,我们通过在Twitter数据集上使用经过验证的事实检查的陈述来标记数据来克服这一数据稀缺性挑战。此外,我们将文本功能与用户级功能(例如关注者计数和朋友计数)和推文级功能(例如Tweet中的提及,主题标签和URL)结合起来,以充当检测错误信息的其他指标。此外,我们分析了推文中机器人的存在,并表明机器人随着时间的流逝改变了其行为,并且在错误信息中最活跃。我们收集了1022万个Covid-19相关推文,并使用我们的注释模型来构建一个广泛的原始地面真实数据集以进行分类。我们利用各种机器学习模型来准确检测错误信息,我们的最佳分类模型达到了精度(82%),召回(96%)和假阳性率(3.58%)。此外,我们的机器人分析表明,机器人约为错误信息推文的10%。我们的方法可以实质性地暴露于虚假信息,从而改善了通过社交媒体平台传播的信息的可信度。
translated by 谷歌翻译
全向视频中的光流估计面临两个重要问题:缺乏基准数据集以及调整基于视频的方法以适应全向性质的挑战。本文提出了第一个具有360度视野Flow360的感知上天然合成的全向基准数据集,其中有40个不同的视频和4,000个视频帧。我们在数据集和现有的光流数据集之间进行了全面的特征分析和比较,这些数据集表现出感知现实主义,独特性和多样性。为了适应全向性质,我们提出了一个新颖的暹罗表示学习框架(SLOF)。我们以对比度的方式训练我们的网络,并结合了对比度损失和光流损失的混合损失函数。广泛的实验验证了所提出的框架的有效性,并在最新方法中显示出40%的性能提高。我们的Flow360数据集和代码可在https://siamlof.github.io/上找到。
translated by 谷歌翻译
已经证明,基于光子微孔谐振器(MRR)硬件加速器可为处理深卷积神经网络(CNN)提供破坏性的加速和能源效率的改进。但是,以前基于MRR的CNN加速器无法为具有混合张量的CNN提供有效的适应性。此类CNN的一个例子是可分离的CNN。在这种不灵活的加速器上对CNN进行CNN的推断通常会导致低硬件利用率,从而降低了加速器的可实现性能和能源效率。在本文中,我们提出了一种在基于MRR的CNN加速器中引入可重构性的新方法,以使加速器硬件组件和使用硬件组件处理的加速器硬件组件和CNN张量之间的尺寸兼容性进行动态最大化。我们根据加速器中使用的硬件组件的布局和相对位置将基于最新的MRR的CNN加速器分为两个类别。然后,我们使用我们的方法在这两个类别中引入加速器中的可重构性,从而改善其并行性,有效映射不同尺寸的张量,速度和整体能源效率的灵活性。我们根据面积比例的前景(所有加速器的相等硬件区域)对可重构加速器进行了可重构加速器的评估。我们对四个现代CNN的推断的评估表明,与来自MRR基于MRR的基于MRR的加速器相比,我们设计的可重新配置CNN加速器可改善高达1.8倍,而FPS/W高达1.5倍。先前的工作。
translated by 谷歌翻译
基于变压器的体系结构已在各种视觉域(最著名的图像和视频)中变得更具竞争力。虽然先前的工作已经孤立地研究了这些模式,但拥有一个共同的体系结构表明,人们可以训练单个统一模型以多种视觉方式。事先尝试进行统一建模通常使用针对视觉任务量身定制的体系结构,或与单个模态模型相比获得较差的性能。在这项工作中,我们表明可以使用蒙版的自动编码来在图像和视频上训练简单的视觉变压器,而无需任何标记的数据。该单个模型学习了与图像和视频基准上的单模式表示相当或更好的视觉表示,同时使用了更简单的体系结构。特别是,我们的单一预算模型可以进行审核,以在ImageNet上获得86.5%的速度,而在挑战性的事物V2视频基准测试中,可以实现75.3%的范围。此外,可以通过丢弃90%的图像和95%的视频补丁来学习该模型,从而实现非常快速的训练。
translated by 谷歌翻译
Current approaches to multi-agent cooperation rely heavily on centralized mechanisms or explicit communication protocols to ensure convergence. This paper studies the problem of distributed multi-agent learning without resorting to centralized components or explicit communication. It examines the use of distribution matching to facilitate the coordination of independent agents. In the proposed scheme, each agent independently minimizes the distribution mismatch to the corresponding component of a target visitation distribution. The theoretical analysis shows that under certain conditions, each agent minimizing its individual distribution mismatch allows the convergence to the joint policy that generated the target distribution. Further, if the target distribution is from a joint policy that optimizes a cooperative task, the optimal policy for a combination of this task reward and the distribution matching reward is the same joint policy. This insight is used to formulate a practical algorithm (DM$^2$), in which each individual agent matches a target distribution derived from concurrently sampled trajectories from a joint expert policy. Experimental validation on the StarCraft domain shows that combining (1) a task reward, and (2) a distribution matching reward for expert demonstrations for the same task, allows agents to outperform a naive distributed baseline. Additional experiments probe the conditions under which expert demonstrations need to be sampled to obtain the learning benefits.
translated by 谷歌翻译